数据驱动的预测方法可以有效,准确地将蛋白质序列转化为生物活性结构,对于科学研究和治疗发展非常有价值。使用共同进化信息确定准确的折叠格局是现代蛋白质结构预测方法的成功基础。作为最新的状态,AlphaFold2显着提高了准确性,而无需进行明确的共同进化分析。然而,其性能仍然显示出对可用序列同源物的强烈依赖。我们研究了这种依赖性的原因,并提出了一种元生成模型Evogen,以弥补较差的MSA靶标的Alphafold2的表现不佳。 Evogen使我们能够通过降低搜索的MSA或生成虚拟MSA来操纵折叠景观,并帮助Alphafold2在低数据表方面准确地折叠,甚至通过单序预测来实现令人鼓舞的性能。能够用很少的MSA做出准确的预测,不仅可以更好地概括为孤儿序列的Alphafold2,而且使其在高通量应用程序中的使用民主化。此外,Evogen与AlphaFold2结合产生了一种概率结构生成方法,该方法可以探索蛋白质序列的替代构象,并且序列生成的任务意识可区分算法将使包括蛋白质设计在内的其他相关任务受益。
translated by 谷歌翻译
蛋白质是人类生命的重要组成部分,其结构对于功能和机制分析很重要。最近的工作表明了AI驱动方法对蛋白质结构预测的潜力。但是,新模型的开发受到数据集和基准测试培训程序的限制。据我们所知,现有的开源数据集远不足以满足现代蛋白质序列相关研究的需求。为了解决这个问题,我们介绍了具有高覆盖率和多样性的第一个百万级蛋白质结构预测数据集,称为PSP。该数据集由570K真实结构序列(10TB)和745K互补蒸馏序列(15TB)组成。此外,我们还提供了该数据集上SOTA蛋白结构预测模型的基准测试训练程序。我们通过参与客串比赛验证该数据集的实用程序进行培训,我们的模特赢得了第一名。我们希望我们的PSP数据集以及培训基准能够为AI驱动的蛋白质相关研究提供更广泛的AI/生物学研究人员社区。
translated by 谷歌翻译
探针车的使用日益增长会产生大量的GNS数据。受卫星定位技术的限制,进一步提高地图匹配的准确性是具有挑战性的工作,尤其是对于低频轨迹。当与轨迹匹配时,自我车辆的当前旅行时空信息对于数据量最少而言最有用。此外,还有大量其他数据,例如其他车辆的状态和过去的预测结果,但是很难提取有用的信息来匹配地图和推断路径。大多数地图匹配研究仅使用自我车辆的数据,而忽略了其他车辆的数据。基于它,本文设计了一种新的地图匹配方法,以充分利用“大数据”。首先,我们根据与本匹配探针的空间和时间距离将所有数据分为四组,这使我们能够对其有用性进行排序。然后,我们设计了三种不同的方法来从它们中提取有价值的信息(分数):速度和轴承的分数,历史用法的分数以及使用光谱图马尔可夫中立网络的交通状态分数。最后,我们使用修改后的TOP-K最短路径方法来搜索椭圆区域内的候选路径,然后使用Fused分数推断路径(投影位置)。我们使用中国的现实世界数据集测试了针对基线算法的建议方法。结果表明,所有评分方法都可以增强地图匹配的精度。此外,我们的方法优于其他方法,尤其是当GNSS探测频率小于0.01 Hz时。
translated by 谷歌翻译
心血管疾病是全球死亡的主要原因,是一种与年龄有关的疾病。了解衰老期间心脏的形态和功能变化是一个关键的科学问题,其答案将有助于我们定义心血管疾病的重要危险因素并监测疾病进展。在这项工作中,我们提出了一种新型的条件生成模型,以描述衰老过程中心脏3D解剖学的变化。提出的模型是灵活的,可以将多个临床因素(例如年龄,性别)整合到生成过程中。我们在心脏解剖学的大规模横截面数据集上训练该模型,并在横截面和纵向数据集上进行评估。该模型在预测衰老心脏的纵向演化和对其数据分布进行建模方面表现出了出色的表现。
translated by 谷歌翻译
了解脑损伤的强度特征是定义神经系统研究和预测疾病负担和结局的基于图像的生物标志物的关键。在这项工作中,我们提出了一种基于前景的新型生成方法,用于对局部病变特征进行建模,该方法既可以在健康图像上产生合成病变,又可以从病理图像中综合受试者特异性的伪健康图像。此外,该方法可以用作数据增强模块,以生成用于训练大脑图像分割网络的合成图像。在磁共振成像(MRI)上获得的多发性硬化症(MS)脑图像的实验表明,所提出的方法可以生成高度逼真的伪健康和伪病理学脑图像。与传统的数据增强方法以及最近的病变感知数据增强技术Carvemix相比,使用合成图像进行数据扩展可改善大脑图像分割的性能。该代码将在https://github.com/dogabasaran/lesion-synthesis中发布。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译